Closed Bug 1810951 Opened 2 years ago Closed 2 years ago

Implement Glean-based crash pings in Fenix

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: gsvelto, Assigned: afranchuk)

References

(
URL
)

Details

Attachments

(1 file)

pr 671: Bug 1810951 - Implement Glean-based crash pings in lib-crash 2 years ago Alex Franchuk [:afranchuk] 58 bytes, text/x-github-pull-request		Details \| Review

Gabriele Svelto [:gsvelto]

Reporter

Description

•

2 years ago

As per title, the idea is to implement the minimal crash ping introduced in bug 1790569. While the ping will have only a small amount of functionality it will allow us to gauge Fenix actual crash rate. It will also be improved in lockstep with the desktop version so that we maintain feature equality between the two.

Alex Franchuk [:afranchuk]

Assignee

Updated

•

2 years ago

Assignee: nobody → afranchuk

Alex Franchuk [:afranchuk]

Assignee

Comment 1

•

2 years ago

•

Edited

I'm becoming familiar with the Firefox for Android code, and it seems like there's a number of possible scopes for this change.

GeckoView: adding crash pings here may best achieve the desired outcome, depending on how you look at it. We could get crash pings from anyone using GeckoView (which may be projects other than Fenix). On one hand, this would cover all Gecko crashes, but on the other, some people using GeckoView may not like telemetry being built-in. However, it could be opt-in. I expect the only crashes that could be reported would be non-fatal ones.

It's worth noting that if we want to capture other (non-Gecko) crashes in Fenix, maybe those could be reported separately (e.g. as a separate crash "product" than GeckoView), which may lend itself better to debugging/metrics tracking.

geckoview doesn't currently contain Glean, however geckoview-omni does.
Android Components lib-crash: the crash library already supports Sentry and Socorro as services, so adding a Glean service here would be appropriate. This is also where the crash count Glean metric is already integrated.

One might consider this equivalent to (1) in some regards, since an app writer (and Fenix) may opt-in by using lib-crash (as long as they're aware it exists).
Fenix: adding crash pings directly to Fenix would be fairly straightforward. The codebase already uses Glean, sends a number of pings, and has existing logic managing crash events. However, currently crash reporting in Fenix is done through (2), so putting it directly in the Fenix code is likely not the best approach.

Gabriele Svelto [:gsvelto]

Reporter

Comment 2

•

2 years ago

My take on this is that 2. is the best choice. Crash pings are per-product so they should not be in GeckoView as that's just the foundation for building a browser (or an app that's not a browser!). Android Components is a good fit because it already has all the plumbing in place and is used by Fenix - but other apps based on GeckoView won't necessarily use it which is fine. That being said the final choice rests on the Fenix team so we'll have to ask for some feedback directly from them.

Alex Franchuk [:afranchuk]

Assignee

Comment 3

•

2 years ago

Okay, I expect they'll agree with your opinion, seeing as that's where all the other crash ping/reporting is handled. I wasn't sure whether GeckoView itself is viewed as an individual product, which is why I brought up the possibility of getting crash information directly from it.

I suppose if someone were using GeckoView and experiencing crashes, our first response (assuming they filed a bug) might be to advise them to add lib-crash so that we can get a better idea of what is going on at the moment of the crash (if it isn't clear enough already).

Travis Long [:travis_]

Comment 4

•

2 years ago

There already is a Glean crash reporting service defined in lib-crash that handles the messier parts of making things work with Glean running in another process.

I would love to see the lib-crash Glean stuff expanded and/or modified to suit the purposes of this bug, please do let me know if I may be of assistance.

Alex Franchuk [:afranchuk]

Assignee

Comment 5

•

2 years ago

I saw that in the README; I'm glad there's some precedent and existing code to handle that edge case. It seems like that's the most appropriate place for it.

Gabriele Svelto [:gsvelto]

Reporter

Comment 6

•

2 years ago

FYI in case it might be useful this is the code that assembled the ping in Fennec:

It was intermingled with the crash reporting code. The bits that invoked the stack walker were here (just for reference as you won't need them for now, and besides we'll probably do things differently):

Alex Franchuk [:afranchuk]

Assignee

Updated

•

2 years ago

Depends on: 1811928

Alex Franchuk [:afranchuk]

Assignee

Comment 7

•

2 years ago

I've temporarily copied the {metrics,pings}.yaml files into my local copy (in lieu of bug 1811928) and have updated the GleanCrashReporterService to persist/send the crash pings. I'm wondering whether this obsoletes the crash_metrics.crash_count metric: is there any advantage to having the labeled_counter when you could presumably aggregate the same results amongst submitted crash pings? Though worth noting: the crash pings currently don't have a metric which encompasses uncaught exceptions. Would we want to classify that as a process_type (I'd say that's technically correct), or should that be reserved for gecko process types and we add another metric for this sort of distinction?

The work so far is nominal, since I can't get the glean parser to run correctly (but there are no other language check errors so it should compile after that is fixed). I see

Execution failed for task ':lib-crash:gleanGenerateMetricsDocsForDebug'.
> Glean documentation generation failed.

* Try:
> Run with --stacktrace option to get the stack trace.
> Run with --info or --debug option to get more log output.
> Run with --scan to get full insights.

which I'll dig into further if I can figure out where to poke the gradle build files to pass those suggested flags.

Alex Franchuk [:afranchuk]

Assignee

Updated

•

2 years ago

URL: https://github.com/afranchuk/firefox-...

Chris H-C :chutten

Comment 8

•

2 years ago

I'd keep crash_count for a least a couple releases to ensure there's agreement. We might find some differences between measuring things two ways (well, we always do, but we might find meaningful differences) that we might've missed without something to compare against.

Alex Franchuk [:afranchuk]

Assignee

Updated

•

2 years ago

URL: https://github.com/afranchuk/firefox-... → https://github.com/mozilla-mobile/fir...

Alex Franchuk [:afranchuk]

Assignee

Comment 9

•

2 years ago

This is implemented and there is a PR open.

Alex Franchuk [:afranchuk]

Assignee

Comment 10

•

2 years ago

Attached file pr 671: Bug 1810951 - Implement Glean-based crash pings in lib-crash — Details

Chris Peterson [:cpeterson]

Updated

•

2 years ago

Severity: -- → N/A

Alex Franchuk [:afranchuk]

Assignee

Updated

•

2 years ago

Depends on: 1815432

Alex Franchuk [:afranchuk]

Assignee

Comment 11

•

2 years ago

The PR has landed.

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → FIXED

Alex Franchuk [:afranchuk]

Assignee

Updated

•

2 years ago

Updated

•

2 years ago

Blocks: 1851518

Alex Franchuk [:afranchuk]

Assignee

Updated

•

11 months ago

Blocks: 1904847

You need to log in before you can comment on or make changes to this bug.